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The time-frequency uncertainty principle states that the product of the temporal and frequency 
extents of a signal cannot be smaller than l/(47r). We study human ability to simultaneously 
judge the frequency and the timing of a sound. Our subjects often exceeded the uncertainty limit, 
sometimes by more than tenfold, mostly through remarkable timing acuity. Our results establish a 
lower bound for the nonlinearity and complexity of the algorithms employed by our brains in parsing 
transient sounds, rule out simple "linear filter" models of early auditory processing, and highlight 
timing acuity as a central feature in auditory object processing. 

PACS numbers: 



Fourier transformation turns signals "inside out" , in 
the sense that low frequencies dictate what happens at 
long times, while high frequencies are needed to create 
fine temporal detail. A specific instance of this general 
property is Fourier's uncertainty theorem, which states 
that considering the absolute value squared of a signal 
x(t) as a probability distribution in time, 



P(t) = 



(i) 



and the absolute value squared of its Fourier transform 
x(f) as a distribution in frequency, 



P(f) = 



|i(/)| 2 



(2) 



then the product of the standard deviations 

At = y/var(t) and A/ = y/var(f) (3) 



is bounded from below [1]: 



AfA/ > 



47T 



(4) 



whence it is inferred that signals with a small temporal 
extent require many frequencies for their representation. 

The theorem refers to the original signal and its Fourier 
transform. In time-frequency analysis one attempts to 
describe a signal in the two-dimensional time-frequency 
plane, akin to a musical score which has time as its 
horizontal axis and frequency as its vertical axis. Here 
the uncertainty principle begets the Gabor limit [TJ [5] . 
In this setting, the emphasis is shifted from the uncer- 
tainties being a property of the signal, to their being 
a property of the transform itself, leading to an impor- 
tant distinction between resolution and precision. Res- 
olution refers to our ability to verify that two objects 
are distinct, while precision refers to our ability to track 
the parameters of a single object, given prior knowledge 
or assurance it is only one component. This distinc- 
tion is well-established in optics, where it is known the 



wavelength of light limits resolution, so two glass beads 
can not be resolved as being different if they are closer 
together than a wavelength, yet does not limit preci- 
sion, since a single bead can be tracked with nanome- 
ter accuracy. In time-frequency analysis, it has been 
proven that first-order operators cannot exceed the un- 
certainty bound [2]. However, there are many nonlin- 
ear distributions that can achieve higher precision than 
the Gabor limit when applied to isolated signal compo- 
nents; these include quadratic (Cohen's class) distribu- 
tions like Wigner-Ville [3] and Choi- Williams [4], and 
many higher-order ones, such as multi-tapered spectral 
estimates [5J |H] , the Hilbert-Huang distribution [7j , and 
the reassigned spectrograms . All such distribu- 

tions achieve high precision, yet give interfering results 
when two signals are closer together than an uncertainty 
envelope. Our experimental test is designed to directly 
measure precision, not resolution. 

A key goal in neuroscience is to establish which algo- 
rithms the brain uses to process perceptual information. 
Psychophysics, by establishing tight bounds on the per- 
formance of our sensory systems, is sometimes able to 
rule out entire families of perceptual algorithms as candi- 
dates when they, as a matter of principle, cannot achieve 
the expected performance [121 Q3] . We shall show below 
that human subjects can discriminate better, and occa- 
sionally much better, than the uncertainty bounds. This 
categorically rules out any first order operators, such as 
the standard sonogram, from consideration, and puts a 
stringent bound on the performance of any candidate al- 
gorithm. This is relevant both in the scientific and tech- 
nical areas (e.g. [H]), as many high-level models of au- 
ditory processing assume an underlying representation of 
the earliest steps in auditory information homologous to a 
bank of linear filters |16l 117] . In many applications such 
as speech recognition or audio compression (e.g. MP3 
[IB]), the first computational stage consists of generating 
from the source sound small sonogram snippets, which 
become the input to latter stages. Our data suggest this 
is not a faithful description of early steps in auditory 
transduction and processing, which appear to preserve 
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FIG. 1: (A) Stimulus and task. In our final task 5, sub- 
jects are asked to discriminate simultaneously whether the 
test note (red) is higher or lower in frequency than the leading 
note (green), and whether the test note appears before or after 
the flanking high note (blue). For each instance of the task, 
two numbers are generated (Dt and Df) and two Boolean re- 
sponses (left/right, up/down) are recorded. Tasks I through 
4 lead to this final task: task I is frequency only (uses two 
flanking notes), task 2 timing only, task 3 is frequency only 
but with the flanking high note (blue) as a distractor, and 
task 4 is timing only, with the leading (green) note as a dis- 
tractor. (B) Measurement strategy. As task 5 proceeds, the 
numbers Df and Dt are drawn as Gaussian random numbers 
with variances pref and multf. The smaller these variance, 
the harder the task. The variances are independently con- 
trolled by a 2DfU (two down, one up) procedure: when two 
responses in a row are correct, the variance is reduced and the 
task is made harder; the variance is increasedO for every wrong 
response. This procedure converges to a demanding regime, 
one in which the subject makes frequent mistakes, but fewer 
than 50%. Data shown from subject qr3zb[21j. (C) Da- 
tum definition. We show in red the time responses 
of subject qr3zb; horizontal axis is Dt, vertical axis 
is (for before) or 1 (for after); we have slightly off- 
set the data by random amounts from or 1 to be 
able to visualize the density of points at any given Dt. 
In blue, the psychometric curve which maximizes the 
likelihood of the data. The procedure described in 
1(b) has converged to a high density of tests around 
0, spanning with high density the area in which the 
psychometric curve most rapidly rises[21j. 



much more accurate information about the timing and 
phase of sound components than about their intensity 

nana Em. 

We shall carefully distinguish between the physical at- 
tributes of the stimulus and the homologous psychologi- 
cal quantities. Most relevant will be the distinction be- 
tween At and A/, the physical uncertainties defined by 
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FIG. 2: Figure 2. Summary of main results: discrimina- 
tion limens for each test. Each round dot is a completion 
of Task 5 by a subject on an individual day, with at least 
fOO presentations. There were 12 subjects totaling 26 indi- 
vidual sessions for Gaussian and 12 sessions for notelike tests. 
Blue denotes Gaussian packet while red denotes notelike. The 
two solid lines are the locus of the relation StSf = AtA/; any 
dots below these curves violate the corresponding uncertainty 
relation. Error bars in both dimensions were obtained by gen- 
erating 1000 bootstraps from the raw data and plotting the 
25% — 75% quartiles. Raw data provided in Suppl. Table SI. 



eqns (JTJ [2J) , versus St and Sf, the psychological limens of 
discrimination in time and frequency. It would be abso- 
lutely trivial to violate the theorem by using an incor- 
rect definition of the St and Sf or an incorrect evaluation 
of the bound Therefore the limens should be carefully 
defined to carry the equivalent meaning of a standard 
deviation, so that the actual number is directly compa- 
rable to the equivalent physical attribute. It is standard 
in the literature to operatively define limens of discrim- 
ination through a same-different paradigm. For many 
reasons that are detailed below, but particularly because 
same-different is unlike the standard deviation definition 
of the physical At and A/, we shall operatively define 
St and Sf through a two-alternative forced choice above 
or below paradigm, and then regress by maximum likeli- 
hood the performance data against a psychometric curve 
in the form of an error function; the standard deviation 
parameter of this error function is our limen. 

We test for both limens simultaneously as a single task. 
Prior work in the area (e.g. [2"2"l - f2"4"] ) has always concen- 
trated on comparing measurements of frequency discrim- 
ination limens Sf against the physical temporal duration 
of the sound packets' At. This is inadequate for our pur- 
poses on two grounds: first that it treats both quantities 
in the inequality differently, contradicting the " spirit" of 
the uncertainty principle, and second because it fails to 



3 



verify human ability in an important and ecologically rel- 
evant domain, namely timing acuity. 

We study simultaneous time-frequency acuity for two 
test stimuli [2T]. The first one is a Gaussian packet, for 
which AirAtAf — 1, attaining the bound in the theorem; 
our study shows that many subjects display limens such 
that 4ir5t5f <C 1. In most of the subjects, the overall in- 
crease in performance comes from substantial increases in 
timing accuracy. One of our subjects, ar4tl, for example, 
when tested with notes of At = 35ms attained a limen 
of 5t = 3ms, while the frequency performance degraded, 
Sf > Af. In our second test we study perception of a 
wave packet with a note-like envelope characterized by a 
rapid rise and a slow exponential decay. Such envelopes 
are sub-optimal according to the uncertainty principle, 
having a product AirAtAf strictly bigger than one; in 
our case, 47rAtA/ = 5.7079. However the performance 
of our subjects on such packets is just as good, if not bet- 
ter, than their performance on the Gaussian packet; this 
has broad implications for understanding early auditory 
processing. 

The results from task 5 are summarized in Figure 2. 
Each dot there corresponds to a simultaneous limen mea- 
surement as outlined above. Some subjects performed 
several different measurements, never on the same week. 
Two extremes are worth discussing in detail. The lowest 
blue dot at the bottom center of the plot displays the 
greatest violation of the principle in our records, by a 
factor of about 13. The subject qr3zb displayed in equal 
measure a marked increase in frequency acuity as well 
as temporal acuity, and hence the measurement is below 
and to the left of the physical values of At, Af for the 
Gaussian note (indicated by the black lines). The subject 
is a professional musician and possesses absolute pitch. 
The second point to consider is the leftmost point, at the 
center left of the diagram, from subject ar4tl. This is the 
smallest ?t limen in our records; at 3 ms, the subject was 
able to discriminate the relative timing of two notes by a 
factor of 13 better than their widths; it should be noted 
that 3ms is barely more than a single period of the test 
note, 2.27ms. However this subject was unable to esti- 
mate frequency better than its physical extent, which is 
indicated by the dot being above the black line indicating 
the Gaussians Af, so overall this measurement beats the 
uncertainty principle only by a factor of 10. The subject 
is an electronic musician who microcomposes and works 
in precision sound editing. 

We can now examine some implications of this data. 
First, even though the notelike packet's uncertainty prod- 
uct is substantially above the minimum, the subjects 
seem to be able to discriminate with it just as well as 
with the Gaussian packet, leading to two measurements 
(red dot at the bottom of the graph and red dot on the 
black horizontal line) that beat relative uncertainty by a 
factor of 50: StSf « (l/50)AtA/, and absolute uncer- 
tainty by a factor of 10: AnStSf « (1/10). Therefore we 



may conclude that the larger uncertainty product of the 
test note does not affect the subjects' acuity, at least in 
this particular case. Second, the plot shows a number of 
different strategies that subjects use to discriminate, with 
a remarkable spread: from those who do not achieve the 
physical limits in either dimension (1), those who have 
better frequency but worse timing (4), those with bet- 
ter timing and worse frequency (10), and those who have 
both better timing as well as better frequency discrimi- 
nation than the physical values (8). While the relative 
number of measurements in each category undoubtedly 
reveals the underlying bias of our subject population, 
the fact that there are many strategies should be robust. 
However, even so there is a noticeable shift of the cloud 
to the left of the reference notes, so that we can see that 
on median the subjects perform twice as well in timing 
discrimination as the physical value; 80% of the Gaussian 
data and 100% of the notelike data lies on the St < At 
halfplane. 

It is important to stress the difficulty of the task. Pre- 
liminary testing on a handful of non-musicians and am- 
ateur musicians yielded no subjects that could break the 
Uncertainty Principle. Non-musicians found task 3 (fre- 
quency discrimination with distractors) difficult, whereas 
musicians found it trivial, likely due to their practice 
playing instruments (or singing) when surrounded by 
other instrumentalists. The ability to ignore, or other- 
wise compartmentalize out distractors is believed to be 
one of the major sources of the better uncertainty prod- 
uct results for musicians in task 5. We found that com- 
posers and conductors achieved the best results in task 
5, consistently beating the uncertainty principle, often 
by a factor of 2 or more, whereas performers were more 
likely to beat it only by a few percentage points. After 
debriefing experimentees, it appears that the necessity 
of hearing multi-voiced music (both in frequency and in 
time) in one's head and coaching others to perform it 
led to the improved performance of conductors and com- 
posers. 

Early last century a number of auditory phenomena, 
such as residue pitch and missing fundamentals, started 
to indicate that the traditional view of the hearing pro- 
cess as a form of spectral analysis had to be revised. In 
1951 Licklider [25] set the foundation for the temporal 
theories of pitch perception, in which the detailed pattern 
of action potentials in the auditory nerve is used |26j . as 
opposed to spectral or place theories, in which the overall 
amplitude of the activity pattern is evaluated without de- 
tailed access to phase information. The groundbreaking 
work of Ronken [22] and Moore [23] found violations of 
uncertainty-like products and argued for them to be ev- 
idence in favor of temporal models. However this line of 
work was hampered fourfold, by lack of the formal foun- 
dation in time-frequency distributions we have today, by 
concentrating on frequency discrimination alone, by tech- 
nical difficulties in the generation of the stimuli, and not 
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the least by lack of understanding of cochlear dynam- 
ics, since at the time the active cochlear processes had 
not yet been discovered. Perhaps because of these rea- 
sons this groundbreaking work did not percolate into the 
community at large, and as a result most sound analysis 
and sound processing tools today continue using models 
based on spectral theories. We believe it is time to revisit 
this issue. 

We have conducted the first direct psychoacoustical 
test of the Fourier uncertainty principle in human hear- 
ing, by measuring subject performance on a simultaneous 
temporal and frequency discrimination task. We have 
meticulously defined our discrimination limens 5t,5f to 
correspond to standard deviations of a Gaussian distri- 
bution, so as to be directly comparable to the physical 
attributes At, A/ as they are defined in the theorem, to 
avoid the possibility of a fictitious violation of the theo- 
rem's bound by usage of incorrect units. We have used 
as our two paradigmatic stimuli the Gaussian envelope, 
which is the optimal envelope according to the uncer- 
tainty theorem, and a notelike envelope with the sharp 
onset and shallower decay characteristic of many natural 
sounds [27] and of cochlear responses. 

Our data indicate that human subjects often beat the 
bound prescribed by the uncertainty theorem, sometimes 
by factors in excess of 10. This is sometimes accom- 
plished by an increase in frequency acuity, but by and 
large it is temporal acuity that is increased and largely 
responsible for these gains. Our data further indicate 
subject acuity is just as good for a note-like amplitude 
envelope as for the Gaussian, even though theoretically 
the uncertainty product is increased for such waveforms. 
Our study directly rules out many of the simpler models 
of early auditory processing, often used as input to the 
higher-order stages in models of higher auditory function. 
We hope our study will direct further inquiry into which 
family of time-frequency analysis (e.g. [H H2] 
may be the mechanism underlying the auditory hyper- 
acuity displayed by our subjects. Elucidation of such 
mechanisms is likely to have wide-ranging applications, 
both in fields where matching human performance is an 
issue, such as speech recognition, as well as those more 
removed, such as radar, sonar and radio astronomy. 

We wish to thank Mayte Suarez-Farinas and Maurizio 
Pellegrino for their algorithmic and psychophysical ex- 
pertise, Tim Gardner for valuable discussions, and Josh 
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are available in the Supplementary Information. Sup- 
ported in part by NSF grant EF-0928723. 
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